What is your corpus, why did you choose it, and what do you think is interesting about it?
Last.fm is an online music database, a music recommender system, and a social networking service, which was founded in the days when MSN, Myspace, and Runescape were still a thing. In general, the website offers a plugin for you to install on your PC and phone, which can track your listening behaviour. One listen, a scrobble, is then transferred (or “scrobbled”) to the database and displayed on your personal profile. Based on the collected data, it could also recommend you new music to discover or connect you to people with similar music taste. Although the social aspects have been watered down, I’ve still been using their service ever since June 2011 (my profile). With a vast amount of data up for grabs, it would be a waste to leave the data as it is. That is why I’m interested in learning more about how my listening has changed over the years.
As of December 31st of 2020, I have approximately over 97.000 registered scrobbles and 24.000 unique tracks over the course of ten years. The size is too big for the scope of this course, so I will be limiting to a set number of top tracks each year. This makes it easier to explore the data without losing much overview of my general listening behaviour.
What are the natural groups or comparison points in your corpus and what is expected between them?
My corpus will be divided in years from June 2011 to the end of 2020. According to a NYTimes-article, our musical taste is established during our (formative) teenage years. If that is the case, I’d expect that a certain music style from my teenage years would show up throughout my corpus. Apart from that, I still expect changes in my music preferences, as I grow older.
How representative are the tracks in your corpus for the groups you want to compare?
I used Spotlistr and Soundiiz to transfer 60 tracks per year from my last.fm profile to Spotify. As I’ve been listening to albums more than separate songs at some point in life, I decided to grab top 10 tracks and the remaining 50 tracks between #11-100 at random to broaden the scope. Sometimes the tool didn’t pick the correct track due to changes in the metadata, for which I adjusted manually. Examples include band name changes, such as ‘Viet Cong’ to ‘Preoccupations’ and ‘Andrew Jackson Jihad’ to ‘AJJ’. If a top 10 song was missing, then the next song was selected (#11 and so on). I also removed tracks that are considered as intros or interludes.
My corpus comes with a few limitations:
I’ve only started using the last.fm plugin on my smartphone since February 2015. Before then, I relied on my PC/laptop to log my scrobbles, which makes the data between 2011 and 2014 less accurate.
Possible under-representation of certain music styles in my 60 track selections. As a simple example: song 1 of style A has 100 scrobbles, song 2 and 3 of style B have 60 each. In total, style A has 100 scrobbles, whereas style B has 120, which is more than style A.
On a handful occasions, I fell asleep with my music and last.fm still on.
Identify several tracks in your corpus that are either extremely typical or atypical
Typical songs:
Processed by the Boys - Protomartyr: Musical styles (post-punk) from songs like this one have dominated my corpus since 2011.
Ferrum - Chihei Hatakeyama: Typical music (ambient) I listen to when I need to focus during work or study (especially since university).
Atypical songs:
Setsuyakuka - Tricot: Classified as math rock, a genre which gained prominence in my corpus starting from 2016.
Rosebud - U.S. Girls: I don’t listen to a lot of music that is considered ‘pop’. A track such as this one, however, stands high in my last.fm charts.
Gruppa Krovi - Kino: Alongside Tricot the only two only two non-English singing music groups in my all-time top 10.
Logo of last.fm
View my corpus per year:
The interactive barplot shows how many scrobbles were recorded from June 11th, 2011 to December 31st, 2020, including the totals at the top of each bar. Zooming in on the bottom of the bars, you can find the top 5 most listened artists of each year. Also included are events that could have influenced my listening behaviour.
Looking at all my recorded scrobbles from start to finish, you can observe that it increased significantly in 2015, after which it peaked in 2016 at 17773. As I mentioned in the previous slide, this increase can be explained by the fact that I started using the last.fm plugin on my phone, after I had purchased one which supported the plugin (from 249 in December, 2014 to 1635 in January, 2015). Another possible reason for the increase is that it marked a new chapter as a university student. It was during this time that I spent more time listening to music during study sessions, and made effort to explore new music by spending (too much) on live music events.
From here on out, scrobbles declined continuously down to 4492 in 2020. It seems that the COVID-19 pandemic had an influence on my scrobbles, as the 6-month average following the lockdown in March was around 375, whereas the average was 540 six months prior.
On a surface level, other interesting developments can be observed when looking at the top 5 most listened artists over the years. In 2011 and 2012, my top 5s were dominated by UK artists (9 out of 10). The top 5s of 2013/2014 diversified with artists coming from other places than the UK (4/10). My exposure to music produced in East Asia has had a noticeable effect on my top 5s starting from late 2018. This kind of music was so dominant that it overtook the entire top 5 in 2019 and most of 2018 and 2020.
Spotify offers a number of track-level features, which are used to characterize tracks that are available on their platform. The plot on the left shows how these features have developed over time by calculating the means of the selected 60 tracks per year.
It looks like speechiness and liveness did not change much. The former reflects the fact that I am (proportionally) not much of a rap/hip-hop listener, as this type of music requires values between 0.33 and 0.66. Valence rose slightly early on, but stabilized around the middle at 0.5, whereas energy experienced a noticiable shift upwards in 2016. More on this will follow in the next slide.
Acousticness increased steadily in the early years, but declined after 2014 and did not reach higher than 0.2 since then. One feature seems to have increased permanently, which is instrumentalness, hovering mostly above 0.3 since 2015. This change could be explained by a growth in listening to instrumental music, such as Toe and GoGo Penguin. Another music genre that could’ve influenced the increased instrumental levels around 2015 is ambient(Klara Lewis, Ryuichi Sakamoto), as this was my preferred genre during study sessions. Also, the introduction of this genre could have pushed the acousticness levels further down since 2015.
Spotify’s energy and valence translated into four moods
The ‘moods’ of each track in every year can be presented by plotting the energy feature against valence. Instrumentalness (size) and mode (colour) are displayed in the graphs as well. The plot is made interactive, enabling you to zoom in on individual tracks.
Overall, most songs in every year skew towards high energy levels (> 5.0), especially between 2016 and 2020. Valence on the other hand shows a wider range, spanning across the entire spectrum. From this can be concluded that the music I listened to in the past ten years are mostly happy or angry in general.
An interesting observation is that my listening habits gradually became ‘sadder’ from 2012 to 2014. What’s also striking is that my tracks rarely go deep into the ‘relaxed’ quadrant with the exception of one song in 2015 (Jùhachi). Looking at the size of each point, you can clearly see that the instrumentalness increased substantially from 2015 as mentioned earlier. The instrumentalness feature is, however, not always accurate. In 2018, many songs high on energy are categorized as highly instrumental, while clearly having vocals. This is one example of Spotify not being accurate sometimes.
Based on these three visualizations, my logs can be categorized into (give or take) four periods: - “2011-2012” or ‘Mid-VWO’-era: UK-produced music (see A1) - “2013-2014” or ‘Late-VWO’-era: early diversity (A1), sadder music (A3) - “2015-2018” or ‘UvA’-era: most recorded scrobbles (A1), peak energy (A2), increase instrumentalness (A2) - “2019-2020” or ‘Post-HK’-era: music produced in East Asia (A1)
It seems that there’s one particlar genre in my corpus that keeps coming back throughout the years, namely post-punk. In the corpus, this genre is represented by Joy Division (Mid-VWO), Kino (Late-VWO), Preoccupations (UvA), and Protomartyr (UvA/Post-HK). I assume that this is what the NYTimes-writer meant with musical taste peaking as teens. Therefore, I will be examining the most listened song of each band in the following slides.
Chromagrams
On the left, you can see six chromagrams that show which pitch classes are prominent within each time interval. The left and middle column are songs from post-punk bands that were dominant in my corpus as described in the previous slide. Songs on the right column are songs I selected based on high instrumentalness.
Looking at each chromagram, you can clearly see some patterns in every song. Joy Division’s “Transmission” switches dominates at C and D, which is likely due to the bass guitar. In “A Star Called Sun”, pitch classes are more varied (C, D, A). You can also see three repeated G blocks that correspond to the singer switching between pitches. “Continental Shelf” sounds “noisier” compared to the other songs, which is observable in its chromagram as the colours are less contrasted. However, you can still see that the song switches between G/B and D/E. “Processed by the Boys” is less “noisy”, but shows some irregularities in the preferred pitch classes. There are multiple sections in which D/E/F/G is mostly applied, but there are two instances when pitch class A is used extensively, followed by a shorter G.
This graph divides all 600 songs evenly into five time frames and presents every period in a key/mode histogram to see which key and modes were dominant. The years 2011-2012 looks to be the least random than in later years, emphasizing G/A keys and to a lesser extent keys D/E. It is also apparent that D# is the least favourite key throughtout my corpus. When looking at the distribution from one period to the next, it looks like there’s less of a preferred key, as many keys reach no higher than 12-13 songs.
As for the modes, it looks like there’s a slight preference towards major key. One interesting observation is the B key in “2013-2014”, which only has minor keys songs involved. The year 2014 was in particular a low valence year (see A2/A3), which could explain the sad minor key at B.
Self-Similarity Matrices
On the left, you see self-similarity matrices of the same songs as seen in B1, displaying chroma and timbre features. As for the settings:segments are set in bars, applied normalization and summary statistics are euclidean and root mean square, respectively. The darker the colour brightness, the more similar the segments are compared to segments before it in time.
Findings:
…
…
Things to consider: Compare above to top track of years 2016, 2018 (Rosebud) or other/more years.
Cepstrograms
c01 is known to be ‘loudness’, c02 stands for the low frequencies, c03 represents the mid range and c04 stands for ‘noise’. The other levels are not that clear and also less and less visible.
c01
Text
Outlier with loudness –> Dead Cowboy by Lightning Bolt. Drop in c09 at 190 and 300 seconds coincides with lack of drums.
C11 start chorus followed by instrumentals and occasional backup vocals
Expect
On the left, you’ll observe a plotted histogram containing the tempo of every song per two years. The average tempo of every time period are similar to each other, with a range between 125 and 135 bpm. A minor outlier is 2015-2016, which comes around 134 bpm. This correlates to 2016 having the highest energy feature (see A3).
For the tempogram analysis, a typical post-punk (“Unconscious Melody”) and an atypical math-rock (“Setsuyakuka”) song were selected to compare its rhythmic differences. The tempo of Unconscious Melody is mostly constant around 220 bpm, with an occasional switch towards 470 bpm.
The tempo of the song “Setsuyakuka” is, however, less constant. Although the tempo is relatively pronounced around 380 bpm in the first half, it goes all over the place in the second. Especially between 150-220 seconds seems to be very diverse compared to the rest of the song. Listening to the song, it could be possible that the tempogram had a less hard time calculating the tempo, as that section sounds less ‘noisy’ than the other parts.
Random Forest Matrix:
- Periods 2011-2012, 2013-2014, and 2019-2020 are most distinctive compared to others.
- 2017-2018 has a minor overlap with 2015-2016. In the context of my corpus, it could mean that my listening habits were similar between those years.
Random forest Tree:
- Timbre is important (c01, c03, and c11).
- Instrumentalness is also an important factor. Could be due to math rock (usually without vocals in East Asia) and ambient music, of which both genres became prominent on my last.fm starting from 2015-2016.
Table 1. Accuracy of random forest model:
| Group | Precision | Recall |
|---|---|---|
| 2011-2012 | 0.444 | 0.467 |
| 2013-2014 | 0.824 | 0.875 |
| 2015-2016 | 0.330 | 0.300 |
| 2017-2018 | 0.353 | 0.342 |
| 2019-2020 | 0.484 | 0.492 |
This plot visualizes how the top features were distinguished with the random forest model. Timbre component c03 was plotted against c11, while the danceability feature was presented in size.
In general, variation among age groups are not very clear, however ‘2019-2020’ seems to skew towards the bottom left corner. - Most yellow dots at the bottom left part are considered hip-hop music (Leo Wang, Matt Force). These songs are grouped with three other hip-hop tracks by Run The Jewels and Noname from ‘2017-2018’. - Also interesting are the four songs at the top of c11 with low danceability, which happen to be ambient (Chihei Hatakeyama, Ryuichi Sakamoto, Julianna Barwick).
Although not too distinctive, it looks like the trained classifier was able to distinguish hip-hop from my post-HK years.
TODO
Relevant link: https://www.theverge.com/2018/2/12/17003076/spotify-data-shows-songs-teens-adult-taste-music